A Generalized Methodology for Data Analysis.
نویسندگان
چکیده
Based on a critical analysis of data analytics and its foundations, we propose a functional approach to estimate data ensemble properties, which is based entirely on the empirical observations of discrete data samples and the relative proximity of these points in the data space and hence named empirical data analysis (EDA). The ensemble functions include the nonparametric square centrality (a measure of closeness used in graph theory) and typicality (an empirically derived quantity which resembles probability). A distinctive feature of the proposed new functional approach to data analysis is that it does not assume randomness or determinism of the empirically observed data, nor independence. The typicality is derived from the discrete data directly in contrast to the traditional approach, where a continuous probability density function is assumed a priori. The typicality is expressed in a closed analytical form that can be calculated recursively and, thus, is computationally very efficient. The proposed nonparametric estimators of the ensemble properties of the data can also be interpreted as a discrete form of the information potential (known from the information theoretic learning theory as well as the Parzen windows). Therefore, EDA is very suitable for the current move to a data-rich environment, where the understanding of the underlying phenomena behind the available vast amounts of data is often not clear. We also present an extension of EDA for inference. The areas of applications of the new methodology of the EDA are wide because it concerns the very foundation of data analysis. Preliminary tests show its good performance in comparison to traditional techniques.
منابع مشابه
Generalized Fuzzy Inverse Data envelopment Analysis Models
Traditional DEA models do not deal with imprecise data and assume that the data for all inputs and outputs are known exactly. Inverse DEA models can be used to estimate inputs for a DMU when some or all outputs and efficiency level of this DMU are increased or preserved. this paper studies the inverse DEA for fuzzy data. This paper proposes generalized inverse DEA in fuzzy data envelopment anal...
متن کاملBayesian paradigm for analysing count data in longitudina studies using Poisson-generalized log-gamma model
In analyzing longitudinal data with counted responses, normal distribution is usually used for distribution of the random efffects. However, in some applications random effects may not be normally distributed. Misspecification of this distribution may cause reduction of efficiency of estimators. In this paper, a generalized log-gamma distribution is used for the random effects which includes th...
متن کاملComparison of Maximum Likelihood Estimation and Bayesian with Generalized Gibbs Sampling for Ordinal Regression Analysis of Ovarian Hyperstimulation Syndrome
Background and Objectives: Analysis of ordinal data outcomes could lead to bias estimates and large variance in sparse one. The objective of this study is to compare parameter estimates of an ordinal regression model under maximum likelihood and Bayesian framework with generalized Gibbs sampling. The models were used to analyze ovarian hyperstimulation syndrome data. Methods: This study use...
متن کاملWhich Methodology is Better for Combining Linear and Nonlinear Models for Time Series Forecasting?
Both theoretical and empirical findings have suggested that combining different models can be an effective way to improve the predictive performance of each individual model. It is especially occurred when the models in the ensemble are quite different. Hybrid techniques that decompose a time series into its linear and nonlinear components are one of the most important kinds of the hybrid model...
متن کاملAn Image Analysis-Based Methodology for Chromite Exploration through Opto-Geometric Parameters; a Case Study in Faryab Area, SE of Iran
Traditional methods of chromite exploration are mostly based on geophysical techniques and drilling operations. They are expensive and time-consuming. Furthermore, they suffer from several shortcomings such as lack of sufficient geophysical density contrast. In order to overcome these drawbacks, the current research work is carried out to introduce a novel, automatic and opto-geometric image an...
متن کاملA Non-radial Approach for Setting Integer-valued Targets in Data Envelopment Analysis
Data Envelopment Analysis (DEA) has been widely studied in the literature since its inception with Charnes, Cooper and Rhodes work in 1978. The methodology behind the classical DEA method is to determine how much improvements in the outputs (inputs) dimensions is necessary in order to render them efficient. One of the underlying assumptions of this methodology is that the units consume and prod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE transactions on cybernetics
دوره شماره
صفحات -
تاریخ انتشار 2017